<pdfDocument variable>.ToText (Function)

ONLINE HELP
WINDEV, WEBDEV AND WINDEV MOBILE

Version:

Home | Sign in | English

New WINDEV, WEBDEV and WINDEV Mobile 2024 feature!

Help / WLanguage / WLanguage functions / Standard functions / PDF functions

Conversion from PDF to text
Special cases

WINDEV

WEBDEV

WINDEV Mobile

Others

See also

<pdfDocument variable>.ToText (Function)

In french: <Variable pdfDocument>.VersTexte

Extracts text from a PDF document.

Example

Syntax

<Result> = <PDF document>.ToText([<Pages to extract>])

<Result>: Character string

Text of PDF file.

<PDF document>: pdfDocument variable

Name of the pdfDocument variable to be used.

<Pages to extract>: Optional character string

Range of pages the text must be extracted form. The format used is the same as the one used in standard print dialog boxes: individual page numbers of range of pages separated by semi-colons. For example, "1;3;4;6-10;12" means that pages 1, 3, 4, 6 to 10, and 12 will be processed.
If this parameter is not specified or if it corresponds to an empty string (""), all the pages are extracted.

Remarks

Conversion from PDF to text

The formatting of the document is lost when the conversion is performed from PDF to text.
The text is extracted in the order or appearance of the PDF commands and it is sequentially written into the result string. The organization of the text in paragraphs and in blocks is kept (as well as the CR characters).
The Unicode characters are not returned.
The data found in a PDF form is not extracted (this data is not stored in the PDF file).

Special cases

PDFIsProtected is used to find out whether a password is required to open a PDF file.
PDFNumberOfPages returns the total number of pages found in a PDF file.

Business / UI classification: Business Logic

Component: wd290wdpdf.dll